Part of iPyMacLern project.
Copyright (C) 2015 by Eka A. Kurniawan
eka.a.kurniawan(ta)gmail(tod)com
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
In [1]:
import sys
print("Python %d.%d.%d" % (sys.version_info.major, \
sys.version_info.minor, \
sys.version_info.micro))
In [2]:
import numpy as np
print("NumPy %s" % np.__version__)
In [3]:
import matplotlib
import matplotlib.pyplot as plt
print("matplotlib %s" % matplotlib.__version__)
In [4]:
# Display graph inline
%matplotlib inline
# Display graph in 'retina' format for Mac with retina display. Others, use PNG or SVG format.
%config InlineBackend.figure_format = 'retina'
#%config InlineBackend.figure_format = 'PNG'
#%config InlineBackend.figure_format = 'SVG'
# For displaying 3D graph
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
The data we use if form different outlets accross multiple cities consists of the profit of each outlet based on the number of population in the city the outlet located. Using this known data, we can perform linear regression analysis among the two variables (profit vs population) and then we can use the result to predict the profit we are going to get for our new outlet if we know the population of the city where the new outlet is going to be located.
In [5]:
filename = 'ex1data1.txt'
data = np.recfromcsv(filename, delimiter=',', names=['population', 'profit'])
In [6]:
plt.scatter(data['population'], data['profit'], color='red', s=2)
plt.title('Restaurant Sells')
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.show()
In [7]:
# Total training samples
m = len(data)
# Add column of ones to variable x
X = np.append(np.ones([m,1]),data['population'].reshape(m,1),1)
# Output
y = data['profit']
# Initialize fitting parameters
theta = np.zeros([2, 1])
For $x$ as input variable (feature) and $y$ as output/target variable, we have cost function $J$ of parameter $\theta$ as follow.
$$J(\theta) = \frac{1}{2m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)}) - y^{(i)}\right)^2$$Where hypothesis $h_{\theta}(x)$ is the linear model as follow.
$$h_{\theta}(x) = \theta^Tx = \theta_0 + \theta_1x$$And $m$ is total training samples and $i$ is the index.
In [8]:
def compute_cost(X, y, theta):
# Total training samples
m = len(y)
h = theta[0] + (theta[1] * X[:,1])
J = (1 / (2*m)) * sum((h - y) ** 2)
return J
In [9]:
J = compute_cost(X, y, theta)
print(J)
Our objective is to have the parameters ($\theta_0$ and $\theta_1$) to produce a hypothesis $h_{\theta}(x)$ as close as possible to target variable $y$. As we have defined our cost function $J(\theta)$ as the error of the two therefore all we need to do is to minimize the cost function $\left(\underset{\theta_0, \theta_1}{\text{minimize}} \hspace{2mm} J(\theta_0, \theta_1)\right)$.
Following is the algorithm where $\alpha$ is the learning rate.
repeat until convergence {
$\hspace{10mm} \theta_j := \theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\theta_0, \theta_1) \hspace{10mm} (\text{for } j=0 \text{ and } j=1)$
}
Derive cost function $J(\theta_j)$ for both $j=0$ and $j=1$.
$j = 0$ : $$\frac{\partial}{\partial\theta_0} J(\theta_0, \theta_1) =$$ $$\frac{\partial}{\partial\theta_0} \frac{1}{2m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)}) - y^{(i)}\right)^2 =$$ $$\frac{\partial}{\partial\theta_0} \frac{1}{2m} \sum_{i=1}^{m}\left(\theta_0 + \theta_1x^{(i)}-y^{(i)}\right)^2 =$$ $$\frac{1}{m} \sum_{i=1}^{m}\left(\theta_0 + \theta_1x^{(i)} - y^{(i)}\right) =$$ $$\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)}) - y^{(i)}\right)$$
$j = 1$ : $$\frac{\partial}{\partial\theta_1} J(\theta_0, \theta_1) =$$ $$\frac{\partial}{\partial\theta_1} \frac{1}{2m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)}) - y^{(i)}\right)^2 =$$ $$\frac{\partial}{\partial\theta_1} \frac{1}{2m} \sum_{i=1}^{m}\left(\theta_0 + \theta_1x^{(i)}-y^{(i)}\right)^2 =$$ $$\frac{1}{m} \sum_{i=1}^{m}\left(\theta_0 + \theta_1x^{(i)} - y^{(i)}\right) . x^{(i)} =$$ $$\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)}) - y^{(i)}\right) . x^{(i)}$$
For this linear case, the algorithm would be as follow.
repeat until convergence {
$\hspace{10mm} \theta_0 := \theta_0 - \alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)}) - y^{(i)}\right)$
$\hspace{10mm} \theta_1 := \theta_1 - \alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)}) - y^{(i)}\right) . x^{(i)}$
}
In [10]:
def perform_gradient_descent(X, y, theta, alpha, iterations):
# Total training samples
m = len(y)
# History of costs
J_history = np.zeros([iterations, 1])
for i in range(iterations):
h = theta[0] + (theta[1] * X[:,1])
error = h - y
theta_tmp1 = theta[0] - ((alpha / m) * sum(error))
theta_tmp2 = theta[1] - ((alpha / m) * sum(error * X[:,1]))
theta[0] = theta_tmp1
theta[1] = theta_tmp2
J_history[i] = compute_cost(X, y, theta)
return theta, J_history
In [11]:
iterations = 1500
alpha = 0.01
theta, J_history = perform_gradient_descent(X, y, theta, alpha, iterations)
In [12]:
theta
Out[12]:
In [13]:
J_history
Out[13]:
In [14]:
plt.plot(J_history)
plt.title('Cost History')
plt.xlabel('Iterations')
plt.ylabel(r'Cost $J(\theta)$')
plt.show()
In [15]:
plt.scatter(data['population'], data['profit'], color='red', s=2)
plt.plot(data['population'], np.dot(X, theta), color='blue')
plt.title('Restaurant Sells')
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.show()
Predict profit for cities with population of 35,000 and 70,000 people.
In [16]:
ttl_population = 35000
print('For population = 35,000, we predict a profit of %s' % \
int(np.dot([1, ttl_population/10000], theta) * 10000))
In [17]:
ttl_population = 70000
print('For population = 70,000, we predict a profit of %s' % \
int(np.dot([1, ttl_population/10000], theta) * 10000))
In [18]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.view_init(azim =-120)
theta0_vals = np.linspace(-10, 10, 100)
theta1_vals = np.linspace(-1, 4, 100)
theta0_mesh, theta1_mesh = np.meshgrid(theta0_vals, theta1_vals)
zs = np.array([compute_cost(X, y, [theta0_val,theta1_val]) for theta0_val,theta1_val in zip(np.ravel(theta0_mesh), np.ravel(theta1_mesh))])
Z = zs.reshape(theta0_mesh.shape)
ax.plot_surface(theta0_mesh, theta1_mesh, Z, cmap=cm.coolwarm)
ax.set_xlabel('Theta 0')
ax.set_ylabel('Theta 1')
ax.set_zlabel(r'Cost $J(\theta)$')
plt.show()